Covid - CA flexdashboard (documentation)
## ── Attaching packages ───────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.2 ✓ dplyr 1.0.0.9000
## ✓ tidyr 1.1.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Packages
Updates (version 0.1.3)
- Wrangling steps have been moved into
helpers.Rscript (includes bothimportandwranglescripts from thecodefolder)
- Changed contents of
valueBox()s to sentence case
- Updated global maps with
orthographicandMercator
- Moved worldwide cases to page 2, and US data to page 3
Johns Hopkins University’s CSSE Data
The COVID-19 data objects are loaded below from various sources. Some of these are updated daily, others are pre-existing datasets for geographic locators or other variables to aid in visualizing/mapping.
Import time series data from Github
This imports data from the CSSEGISandData. These are in the csse_covid_19_time_series folder. These files are updated daily.
The following info comes from the README in the time series repo folder:
## Time series summary (csse_covid_19_time_series)
This folder contains daily time series summary tables, including confirmed,
deaths and recovered. All data is read in from the daily case report.
The time series tables are subject to be updated if inaccuracies are identified
in our historical data. The daily reports will not be adjusted in these
instances to maintain a record of raw data.
Two time series tables are for the US confirmed cases and deaths, reported at
the county level. They are named `time_series_covid19_confirmed_US.csv`,
`time_series_covid19_deaths_US.csv`, respectively.
Three time series tables are for the global confirmed cases, recovered cases
and deaths. Australia, Canada and China are reported at the province/state
level. Dependencies of the Netherlands, the UK, France and Denmark are listed
under the province/state level. The US and other countries are at the country
level. The tables are renamed `time_series_covid19_confirmed_global.csv` and
`time_series_covid19_deaths_global.csv`, and
`time_series_covid19_recovered_global.csv`, respectively.
### Update frequency
Once a day around 23:59 (UTC).
### Deprecated warning
The files below were archived here, and will no longer be updated. With the
release of the new data structure, we are updating our time series tables to
reflect these changes.
Please reference `time_series_covid19_confirmed_global.csv` and
`time_series_covid19_deaths_global.csv` for the latest time series data.
`time_series_19-covid-Confirmed.csv`
`time_series_19-covid-Deaths.csv`
`time_series_19-covid-Recovered.csv`Export the raw csse covid 19 time series data (global)
All of the raw data files are in this list
#> [1] "GDPRaw" "TSConfirmedRaw" "TSConfirmedUSRaw" "TSDeathsRaw"
#> [5] "TSDeathsUSRaw" "TSRecoveredRaw"
I’ve dropped each datasets into a raw folder with a date stamp for safe keeping.
#> data/raw/2020-07-11
#> ├── 2020-07-11-Covus.csv
#> ├── 2020-07-11-DeathsMapCovus.csv
#> ├── 2020-07-11-GDPRaw.csv
#> ├── 2020-07-11-MapCovus.csv
#> ├── 2020-07-11-NYTCovState.csv
#> ├── 2020-07-11-PosMapCovus.csv
#> ├── 2020-07-11-TSConfirmedRaw.csv
#> ├── 2020-07-11-TSConfirmedUSRaw.csv
#> ├── 2020-07-11-TSDeathsRaw.csv
#> ├── 2020-07-11-TSDeathsUSRaw.csv
#> ├── 2020-07-11-TSRecoveredRaw.csv
#> ├── 2020-07-11-TidyCovCaseData.csv
#> ├── 2020-07-11-TidyCovDeathData.csv
#> └── 2020-07-11-TidyPosMapCovus.csv
Wrangling CSSE time series data
The code chunk below runs the script for the wrangling steps to create the data visualizations using the time series CSSE data.
The following steps were taken to wrangle the time series data:
- Convert wide to long (
Confirmed,Recovered,Deaths)
- first I converted
TSConfirmedRawdataset to long form, and converts theDatevariable tomdy()
- Create
WorldTSDataAllby joiningConfirmed,Recovered,Deaths
- This joins the
Confirmed,Recovered, andDeathstogether intoWorldTSDataAll
USTSDataAll= joinConfirmedUSandDeathsUS
- I want to mimic what I did with the
WorldTSDataAlland join these two together. I wantcountry_regionto just be namedcountry, andprovince_stateto just be namedstate. Export these files to processed folder
- Create
SumRegionDate
- this groups the
WorldTSDataAlldata bycountry_regionanddate, then summarizes theconfirmed_sum,recovered_sum, anddeaths_sumvariables. Then it creates a “new case” column withdplyr::lag()withconfirmed_sumandfiltersthedatetomax(date)
- create a most recent day from
SumRegionDatecalledrecent_day
- GDP Country Codes: create a smaller version of the
GDPRawdataset. I also rename some of theregionsinGdp2016
#> # A tibble: 1 x 4
#> region code year country_region
#> <chr> <chr> <dbl> <chr>
#> 1 Gambia, The GMB 2016 Gambia
- Create
SumRegionDateCodesby joiningSumRegionDateandGdp2016
- Join the
SumRegionDateto theGdp2016data country. And because this is the first complete dataset I will be using for data visualizations, I will export this into thedata/processedfolder
fs::dir_ls(paste0("data/processed/",
base::noquote(lubridate::today())),
regexp = "SumRegionDate.csv")#> data/processed/2020-07-11/2020-07-11-SumRegionDate.csv
Dashboard layout
This section covers the data visualizations in the dashboard. Because the dashboard is built using multiple pages, with orientation: columns and vertical_layout: fill. Read more about this layout here.
Page 1: Global COVID-19 Data
Page 1 is titled, Global COVID-19 Data
Column 1: valueBox()s (.bg-secondary)
Column {data-width=400 .bg-secondary}
-----------------------------------------------------------------------
These are built using the following valueBox()s. The dataset WorldTSRecent is below:
This is a tibble with a single row for the most recent stats.
The valueBox() and prettyNum() functions display the objects below:
Worldwide confirmed cases as of…
#> [1] "2020-07-10"
#> [1] 12498467
Worldwide recovered cases as of …
#> [1] "2020-07-10"
#> [1] 6777981
Column 2: Worldwide COVID-19 Confirmed Cases (.tabset)
Column {data-width=600 .tabset}
-----------------------------------------------------------------------
This is the plotly::plot_geo() world map. The dataset that it requires is SumRegionDateCodes, and it’s available to view below:
The visualization uses the plotly::plot_geo() function, which renders a full interactive globe!
# create recent_day
recent_day <- max(SumRegionDateCodes$date)
# Set country boundaries as light gray
line <- list(color = toRGB("#d1d1d1"), width = 0.2)
# create geo for map options
geo <- list(
bgcolor = "#FFFAF0",
showframe = FALSE,
showcoastlines = FALSE,
# this is the globe option
projection = list(type = "orthographic"),
resolution = "100",
showcountries = TRUE,
showocean = TRUE,
showlakes = FALSE,
showrivers = FALSE)
geo_map_confirm_cases <- plotly::plot_geo() %>%
layout(
geo = geo,
paper_bgcolor = "#FFFAF0",
title = paste0("World COVID-19 Confirmed Cases as of ",
recent_day)) %>%
add_trace(
data = SumRegionDateCodes,
z = ~Confirmed,
color = ~Confirmed,
colors = "Reds",
text = ~country_region,
locations = ~code,
marker = list(line = line))
geo_map_confirm_casesColumn 3: Worldwide COVID-19 Deaths (.tabset)
This is a new Mercator map of the confirmed deaths from COVID.
# Set country boundaries as light gray
line <- list(color = toRGB("#d1d1d1"), width = 0.2)
# create geo for map options
# c("#B0E0E6", "#F0FFF0")
geo <- list(
oceancolor = "#B0E0E6",
showframe = FALSE,
showcoastlines = FALSE,
# this is the mercator option
projection = list(type = 'Mercator'),
resolution = "100",
showcountries = TRUE,
showocean = TRUE,
showlakes = FALSE,
showrivers = FALSE)
geo_map_deaths <- plotly::plot_geo() %>%
layout(
geo = geo,
paper_bgcolor = "#e8f7fc",
title = paste0("World COVID-19 Deaths as of ",
recent_day)) %>%
add_trace(
data = SumRegionDateCodes,
z = ~Deaths,
color = ~Deaths,
colors = "Greys",
text = ~country_region,
locations = ~code,
marker = list(line = line))
geo_map_deathsColumn 4: Worldwide COVID-19 Cases (Cumulative) (.tabset)
- Create
WorldTSDataAllDateandWorldTSDataRecent
WorldTSDataAllDateisWorldTSDataAllgrouped bydate
- This groups by the
datecolumn, the summarized theconfirmed,deaths, andrecovered
- Create
WorldTSDataAllDateLongfromWorldTSDataAllDate
- Now I restructure (pivot) to create
WorldTSDataAllDateLong. These data are exported into the processed data folder
fs::dir_ls(paste0("data/processed/",
base::noquote(lubridate::today())),
regexp = "WorldTSDataAllDateLong.csv")#> data/processed/2020-07-11/2020-07-11-WorldTSDataAllDateLong.csv
Now plot with the dataset WorldTSDataAllDateLong
# set colors
colors <- c("red1", "gray1", "springgreen2")
# font style
font_style <- list(
family = "Ubuntu",
size = 14,
color = 'black')
# create base chart
world_cum_point_chart <- WorldTSDataAllDateLong %>%
ggplot2::ggplot(aes(x = date,
y = cases,
color = status)) +
geom_point(size = 1, alpha = 2/5) +
scale_color_manual(values = colors) +
scale_y_continuous(labels = scales::label_number_si(accuracy = 0.1)) +
theme(
plot.margin = margin(0, 0, 0, 0, "pt"),
panel.background = element_rect(fill = "White"),
legend.position = "left",
axis.title = element_blank(),
axis.ticks = element_blank()) +
hrbrthemes::theme_ipsum_tw(plot_title_family = "Ubuntu") +
labs(title = "Worldwide COVID-19 Cumulative Cases",
y = "Cases",
x = "Date",
color = " ")
# pass over to plotly
ggplotly(world_cum_point_chart) %>%
plotly::layout(legend = list(orientation = "h"),
font = font_style)Column 4: Worldwide COVID-19 Dataset (.tabset)
This is a display of the data used in the maps.
This creates the searchable data table (DT::datatable).
SumRegionDateCodes %>%
dplyr::select(`Country region` = country_region,
`Country code` = code,
Date = date,
Confirmed,
`New Cases`,
Recovered,
Deaths) %>%
dplyr::arrange(desc(Confirmed)) %>%
DT::datatable(
rownames = FALSE,
fillContainer = TRUE,
options = list(
bPaginate = FALSE
)
)Page 2: Daily Worldwide COVID-19 Cases
This page is titled, “Daily Worldwide COVID-19 Cases”
Column 1: valueBox()s (.bg-secondary)
New worldwide COVID-19 cases as of…
#> [1] 229949
#> [1] "2020-07-10"
Column 2: Daily Worldwide COVID-19 Cases (Animated)
Below we create two new graphs with gganimate using global and national data.
- Create World data (animated) =
WorldTSIncrementLong
- This requires an
incrementaldataset that calculates newcases,deaths, andrecoveredpatients withdplyr::lag()
status variable - dplyr::lag(status variable, 1)
- Here we filter the
WorldTSDataAllto only the US (WorldTSDataUS) and renameprovince_stateandcountry_region
country_region=="US"
- Create USA data (animated) =
USTSDataAllIncrementLong
- then I create an incremental dataset for US states by grouping by
state, calculating thelag(betweenmetric - dplyr::lag(metric)), then summarizing bydate
- Now we use the incremental dataset to animate the
ggplotusinggganimate
- finally I build the animation, first with
ggplot2, then pass the plot object togganimate::transition_reveal(date)and assign theggplot2::labs()
colors <- c("red1", "gray1", "springgreen2")
world_cum_cases <-
WorldTSIncrementLong %>%
ggplot2::ggplot(mapping = aes(x = date,
y = increment,
group = case,
color = case)) +
ggplot2::geom_line(show.legend = FALSE,) +
ggplot2::scale_y_continuous(labels = scales::label_number_si(accuracy = 1)) +
ggplot2::scale_color_manual(values = colors) +
ggplot2::geom_segment(aes(xend = max(date) + 1,
yend = increment),
linetype = 2,
colour = "grey1",
show.legend = FALSE) +
ggplot2::geom_text(aes(x = max(date) + 1,
label = case),
show.legend = FALSE,
hjust = 0) +
hrbrthemes::theme_ipsum_tw(base_size = 10) +
# set the coordinates
ggplot2::coord_cartesian(
xlim = c(min(WorldTSIncrementLong$date),
max(WorldTSIncrementLong$date) + 7),
ylim = c(max(0, min(WorldTSIncrementLong$increment)),
max(WorldTSIncrementLong$increment)),
clip = "off") +
ggplot2::theme(legend.position = c(0.1, 0.8),
axis.title.x = element_blank()) +
ggplot2::guides(size = FALSE) +
ggplot2::geom_point(aes(size = increment),
alpha = 0.7,
show.legend = FALSE) +
ggplot2::scale_size(range = c(2, 10)) +
gganimate::transition_reveal(date) +
ggplot2::labs(title = "New Global COVID-19 Cases",
subtitle = "Date: {frame_along}",
y = "New daily cases",
x = "Date")
animate_world_cum_cases <- gganimate::animate(world_cum_cases, nframes = 150,
fps = 10,
renderer = gifski_renderer(loop = TRUE))Now we animate and save.
# and save
gganimate::anim_save(filename =
base::paste0(base::noquote(lubridate::today()),
"-animate_world_cum_cases.gif"),
animation = last_animation(),
path = "figs/")knitr::include_graphics(path =
base::paste0("figs/",
base::noquote(lubridate::today()),
"-animate_world_cum_cases.gif"))Page 3: United States COVID-19 Data
This page is the US time-series COVID-19 data.
Column 1: valueBox()s (.bg-secondary)
New cases as of
#> [1] "2020-07-10"
#> [1] 66627
Confirmed cases as of
#> [1] "2020-07-10"
#> [1] 3184573
Recovered cases as of
#> [1] "2020-07-10"
#> [1] 983185
Deaths as of
#> [1] "2020-07-10"
#> [1] 134092
Column 2: United States Confirmed Cases (.tabset)
This is a map of the COVID-19 data for the US using the county-level data from the USTSDataAll dataset, but we need to reduce this to the 51 states in the continental US.
The data for these are stored in the confirmed_us vector.
#> Alabama Alaska Arizona Arkansas California Colorado
#> 50508 1323 116892 26803 310885 36173
#> Connecticut Delaware Florida Georgia
#> 47287 12652 244151 111217
These are paired with the state.name and state.abb vectors from the maps package. Read more about how to create these in the plotly-r book.
us_map_layout <- list(
scope = 'usa',
lakecolor = "#3399FF",
projection = list(type = 'albers usa'))
plot_geo() %>%
add_trace(
z = confirmed_us,
text = state.name,
span = I(0),
locations = state.abb,
locationmode = 'USA-states') %>%
layout(geo = us_map_layout,
title = "Current US Confirmed Cases")The same map is created below using the total confirmed deaths.
#> Alabama Alaska Arizona Arkansas California Colorado
#> 1104 17 2082 313 6955 1724
#> Connecticut Delaware Florida Georgia
#> 4348 517 4102 2965
United States Deaths
Column 3: United States COVID-19 Cases (Animated) (.tabset)
Now we can create the animated plot with ggplot2, specify the animation (gganimate::transition_reveal(date)), and pass it to the animation.
us_cum_cases <- USTSDataAllIncrementLong %>%
ggplot2::ggplot(aes(x = date,
y = increment,
group = case,
color = case)) +
# add line
ggplot2::geom_line(show.legend = FALSE) +
ggplot2::scale_color_manual(values = c("red1", "black", "dodgerblue")) +
# add segment, no legend
ggplot2::geom_segment(aes(xend = max(date) + 1,
yend = increment),
linetype = 2,
color = "grey",
show.legend = FALSE) +
# add text, no legend
ggplot2::geom_text(aes(x = max(date) + 1,
label = case),
hjust = 0,
show.legend = FALSE) +
# set theme
hrbrthemes::theme_ipsum_tw(base_size = 10) +
# set cartesian coordinates to min/max dates
ggplot2::coord_cartesian(xlim = c(min(USTSDataAllIncrementLong$date),
max(USTSDataAllIncrementLong$date) + 7),
clip = "off") +
# position the legend
ggplot2::theme(legend.position = c(0.1, 0.8),
# no x axis title
axis.title.x = element_blank()) +
# no guides
ggplot2::guides(size = FALSE) +
ggplot2::geom_point(aes(size = increment),
alpha = 0.7,
show.legend = FALSE) +
ggplot2::scale_size(range = c(2, 10)) +
# set transition
gganimate::transition_reveal(date) +
# assign labs
ggplot2::labs(title = "US COVID-19 Cases",
subtitle = "at date: {frame_along}",
y = "New Cases",
color = "Status",
x = "Date")
animated_us_cum_cases <- gganimate::animate(us_cum_cases,
nframes = 150,
fps = 10,
renderer = gifski_renderer(loop = TRUE))# and save
gganimate::anim_save(filename =
base::paste0(base::noquote(lubridate::today()),
"-animated_us_cum_cases.gif"),
animation = last_animation(),
path = "figs/")knitr::include_graphics(path =
base::paste0("figs/",
base::noquote(lubridate::today()),
"-animated_us_cum_cases.gif"))Data Tables
These are the datasets currently displayed on last tab of the dashboard.
The US Data
Now we use the WorldTSDataUS for a searchable DT::datatable.
The covdata package
We’ll be using data from the covdata package by Kieran Healy. The goal with this package is to build an set of graphs that the user can select an input (selectInput()) from a list of metrics, and see that metric reflected across all 50 states.
The datasets we’ll be using are covus and nytcovstate. The script below imports and wrangles these data.
The covus data
This is a tidy dataset, with a date for each day, and each metric in the measure variable.
#> Rows: 102,735
#> Columns: 16
#> $ date <date> 2020-07-05, 2020-07-05, 2020-07-05, 2020-07-05,…
#> $ state <chr> "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", …
#> $ fips <chr> "02", "02", "02", "02", "02", "02", "02", "02", …
#> $ data_quality_grade <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"…
#> $ measure <chr> "positive", "negative", "pending", "hospitalized…
#> $ count <dbl> 1138, 122615, NA, 19, NA, NA, NA, 3, NA, 548, 16…
#> $ measure_label <chr> "Positive Tests", "Negative Tests", "Pending Tes…
#> $ day <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
#> $ yr <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, …
#> $ week <dbl> 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, …
#> $ week_year <date> 2020-07-05, 2020-07-05, 2020-07-05, 2020-07-05,…
#> $ month_lbl <ord> Jul, Jul, Jul, Jul, Jul, Jul, Jul, Jul, Jul, Jul…
#> $ month <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, …
#> $ floor_month <date> 2020-07-01, 2020-07-01, 2020-07-01, 2020-07-01,…
#> $ qtr <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
#> $ days_elapsed <drtn> 121 days, 121 days, 121 days, 121 days, 121 day…
A cleaner version of the measure variable is stored in the measure_label variable.
Now we add the necessary date variables grouped by state, remove the regions not included in the geofacet, and put these in a dataset called MapCovus.
The TidyCovDeathData data
These data are used to compare deaths between NYT and COVID-tracking project.
Geofacets
We’ve covered how to create the geofacet graphs in this storybench post.
We will start by building a map with the positive tests in it (see the output below).
PosMapCovus %>%
dplyr::filter(state == "CA") %>%
ggplot2::ggplot(aes(x = days_elapsed,
y = `positive tests`)) +
geom_col(alpha = 1/10) +
ggplot2::labs(title = "California's new COVID cases",
subtitle = paste0("Total COVID-19 positive tests between ",
min(PosMapCovus$date),
" and ",
max(PosMapCovus$date)),
y = "Positive Tests",
x = "Days") +
hrbrthemes::theme_ipsum_rc()Test California with columns and lines.
# get California data
TidyPosMapCA <- TidyPosMapCovus %>% dplyr::filter(state == "CA")
# plot columns
ca_pos_col_plot <- PosMapCovus %>%
dplyr::filter(state == "CA") %>%
ggplot2::ggplot(aes(x = days_elapsed,
y = `positive tests`,
group = date)) +
geom_col(alpha = 1/10, linetype = 0, fill = "grey50")
ca_pos_col_plot +
# add the lines
ggplot2::geom_line(data = TidyPosMapCA,
mapping = aes(x = days_elapsed,
y = `Positive Test Value`,
group = `Positive Test Metric`,
color = `Positive Test Metric`),
show.legend = TRUE) +
ggplot2::labs(title = "California's new COVID cases",
subtitle = paste0("Total COVID-19 positive tests between ",
min(PosMapCovus$date),
" and ",
max(PosMapCovus$date)),
y = "Positive Tests",
x = "Days") +
hrbrthemes::theme_ipsum_rc()US Positive Tests (COVID Tracking project)
Extend this to geofacet.
geofacet_pos <- PosMapCovus %>%
ggplot2::ggplot(aes(x = days_elapsed,
y = `positive tests`,
group = date)) +
geom_col(alpha = 2/10,
linetype = 0) +
ggplot2::geom_line(data = TidyPosMapCovus,
mapping = aes(x = days_elapsed,
y = `Positive Test Value`,
group = `Positive Test Metric`,
color = `Positive Test Metric`),
show.legend = TRUE) +
geofacet::facet_geo( ~ state,
grid = "us_state_grid1",
scales = "free_y") +
ggplot2::labs(title = "US positive COVID tests (7-day rolling average)",
subtitle = paste0("Between ",
min(PosMapCovus$date),
" and ",
max(PosMapCovus$date)),
y = "New Positive Tests",
x = "Days Elapsed") +
ggplot2::theme(axis.text.x = element_text(angle = 315)) +
ggplot2::theme_bw()
geofacet_posUS Cases (NYT vs COVID tracking project)
I want to compare the NYT and COVID-19 tracking datasets (positive tests vs. NYT cases). This can be accomplished using the TidyCovCaseData which has both cases from the NYT dataset, and the positive measure from the COVID tracking project dataset.
geofacet_cases <- TidyCovCaseData %>%
ggplot2::ggplot(aes(x = days_elapsed,
y = `Cases Value`,
group = `Cases Key`,
color = `Cases Key`)) +
ggplot2::geom_line(show.legend = TRUE) +
geofacet::facet_geo( ~ state,
grid = "us_state_grid1",
scales = "free_y") +
ggplot2::labs(title = "US COVID Cases (NYT vs. COVID tracking)",
subtitle = paste0("Between ",
min(TidyCovCaseData$date),
" and ",
max(TidyCovCaseData$date)),
y = "New Cases",
x = "Days Elapsed") +
ggplot2::theme(axis.text.x = element_text(angle = 315)) +
ggplot2::theme_minimal()
geofacet_casesUS Deaths (NYT vs COVID tracking project)
Finally, we can repeat this process, but compare deaths between the NYT and COVID-tracking datasets (which later we can turn into a selectInput).
geofacet_deaths <- TidyCovDeathData %>%
ggplot2::ggplot(aes(x = days_elapsed,
y = `Death Value`,
group = `Death Key`,
color = `Death Key`)) +
ggplot2::geom_line(show.legend = TRUE) +
geofacet::facet_geo( ~ state,
grid = "us_state_grid1",
scales = "free_y") +
ggplot2::labs(title = "US COVID deaths (NYT vs. COVID tracking)",
subtitle = paste0("Between ",
min(DeathsMapCovus$date),
" and ",
max(DeathsMapCovus$date)),
y = "New Positive Tests",
x = "Days Elapsed") +
ggplot2::theme(axis.text.x = element_text(angle = 315)) +
ggplot2::theme_minimal()
geofacet_deaths